Using LongSAGE to Detect Biomarkers of Cervical Cancer Potentially Amenable to Optical Contrast Agent Labelling.

Sixteen longSAGE libraries from four different clinical stages of cervical intraepithelial neoplasia have enabled us to identify novel cell-surface biomarkers indicative of CIN stage. By comparing gene expression profiles of cervical tissue at early and advanced stages of CIN, several genes are identified to be novel genetic markers. We present fifty-six cell-surface gene products differentially expressed during progression of CIN. These cell surface proteins are being examined to establish their capacity for optical contrast agent binding. Contrast agent visualization will allow real-time assessment of the physiological state of the disease process bringing vast benefit to cancer care. The data discussed in this publication have been submitted to NCBIs Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) and are accessible through GEO Series accession number GSE6252.


Introduction
Clinical diagnosis of most cancers and their precursors is predominantly based on phenotypic markers such as appearance of cell nuclei. Classifi cation and staging of disease is determined by evaluation of gross structural features, such as extent of local tumour invasion and presence of disease in other organs. It is now established that cancer arises as a result of successive genetic changes altering cellular processes including growth, angiogenesis, senescence, and apoptosis (Hanahan and Weinburg, 2000). Additionally, many cancers appear to have active infl ammation and wound healing mechanisms (Chang et al. 2004). Proteins taking part in these cellular mechanisms are often strong candidates for biomarkers and molecular targets.
Cervical cancer is usually the result of a human papillomavirus (HPV) infection which initiates neoplastic progression mainly through viral oncoproteins E6 and E7 within the cervical transformation zone at the squamous/columnar junction. The role of HPV to the pathogenesis of cervical cancer has been addressed in recent reviews (zur Hausen, 2002;Woodman et al. 2007). Many HPV types produce only productive lesions following infection and are not associated with human cancers. In such lesions, the expression of viral gene products is carefully regulated, with viral proteins being produced at defi ned times and at regulated levels as the infected cell migrates towards the epithelial surface. The events that lead to viral synthesis in the upper epithelial layers appear common to both the low-and high-risk HPV types. Virus-induced cancers most often arise at sites where productive infection cannot be suitably supported. Productive infection can be divided into distinct phases, with different viral proteins playing specifi c roles (Doorbar, 2006). Upon infection, normal cells gradually advance through stages of cervical intraepithelial neoplasia (CIN). Mild dysplasia (CINI) presents as only a subset of the low third of the epithelium appearing dysplastic, moderate dysplasia (CINII) occurs where the dysplastic cells involve about one-half of the thickness of the epithelium of the cervix, and severe dysplasia (CINIII), or carcinoma-in-situ, is described as the condition where the entire thickness of the epithelium is disordered but the abnormal cells have not yet spread below the surface. If carcinoma-in-situ is not treated, it will often grow into an invasive cervical cancer. High grade dysplasia is considered the most advanced dysplasia with atypical changes in many of the cells and a very abnormal growth pattern of the glands; some of the glands are branching or budding. More than 50% of the cells have large, spotted nuclei and are frequently dividing while the cellular cytoplasm is reduced and looks abnormal. Cancer of the cervix was one of the most common causes of cancer death for American women, but between 1955 and 1992 the number of cervical cancer deaths in the United States dropped by 74% due to the introduction of the Pap test (Papanicolaou and Traut, 1943). Death rates from cervical cancer continue to decline by nearly 4% per year. Even so, the American Cancer Society reports that in 2006, about 3,700 of the 9,710 women diagnosed with cervical cancer in the United States have died from this disease.
HPV infection causes changes in expression levels of a wide variety of genes (Yim and Park, 2006). These differences in gene expression between pre-invasive neoplastic and non-neoplastic tissue give clues to the molecular basis of cancer. Early detection of cervical cancer based on molecular characterization would be clinically advantageous; risk of neoplastic lesion progression could be predicted and response to therapy could be monitored in real time at a molecular level. To monitor molecular characterisation of cancer it follows that the ability to optically image in realtime the molecular features of cancer in vivo is critical Rajadhyaksha et al. 1999;White et al. 1999;Huzaira et al. 2001;Langley et al. 2001;Selkin et al. 2001;Collier et al. 2002) and requires safe, molecular-specifi c contrast agents whose images can be monitored rapidly and non-invasively during their uptake and distribution. The analysis presented here evaluates serial analysis of gene expression (SAGE) libraries to identify novel, cell-surface gene products. Upon mapping of highly differentially expressed SAGE tags to their corresponding genes, the gene products are candidates for antibody testing and optical contrast agent development.

Contrast Agents and Optical Imaging
Short of prevention, improved early stage cancer diagnosis would provide the greatest benefi t for cancer patients. Because proteins may regulate gene expression, ligand-binding properties, molecular structure and dynamics on a temporal basis, protein biomarkers have a signifi cant impact in cancer detection and therapy as therapies are becoming targeted to specifi c signal transduction and metabolic pathways. For example, breast cancers respond to HERCEPTIN (trastuzumab) if the tumor overexpresses Her-2/neu (Baselga et al. 2004;Ross et al. 2004). In the same way, GLEEVEC (imatinib) is most effective against cancers carrying the bcr-Abl translocation (Druker, 2004) and targeted molecular cancer therapy is already used successfully for the eradication of acute leukaemia (Frater et al. 2003;Yee and Keating, 2003). These examples imply that it will be important to produce biomarkers for all stages of cancer. Reliable diagnostics such as DNA screening and immunocytochemical analysis of known cervical neoplasia biomarkers p16 INK4A and minichromosome maintenance (MCM) proteins are not implemented in vivo. Real-time biomarkers of the physiological state of the disease process or markers representative of treatment effi cacy will bring immeasurable benefi t to cancer care in terms of individualized agent selection and dosing. Furthermore, series of agents could be tested to determine empirically the localization of cancer and/or the most effective therapy.
Routine clinical cancer detection employs nonspecifi c contrast agents such as acetic acid which enhance the nuclear backscattering but are limited by small signal magnitude. The fi eld of molecular imaging is rapidly developing imaging agents with high affi nity and specifi city for targeted biomarkers. These new agents allow for the possibility of disease detection earlier than is currently feasible (Weissleder, 2001;Jaffer and Weissleder, 2005). For example, cancer metastases missed by conventional anatomically based imaging methods may be detected in patients by molecular imaging (Harisinghani et al. 2003). Optical imaging of tissue can be carried out non-invasively in real time, giving high spatial resolution (Ͻ1 µm lateral resolution). A number of optical techniques have been established including confocal microscopy Collier et al. 2005), multispectral fl uorescence imaging (Andersson-Engels et al. 1997;Ferris et al. 2001), refl ectance spectroscopy with polarised and unpolarised light (Sokolov et al. 1999(Sokolov et al. , 2002Utzinger et al. 2001), multispectral refl ectance imaging with polarised and unpolarised light (Ferris et al. 2001;Gurjar et al. 2001), and fl uorescence spectroscopy (Gillenwater et al. 1998;Wagnières et al. 1998;Ramanujam, 2000;Sokolov et al. 2002). Together with emerging molecular tools (e.g. DNA screening, tissue proteomic and serum markers), biomarker imaging may soon be used for real-time screening, diagnosis, and detection of disease recurrence and progression (Rudin and Weissleder, 2003).
Contrast agents consist of a biomarker specifi c probe molecule, such as an antibody, conjugated to an optically suitable label. By topically applying molecular specifi c contrast agents to tissues, the scope of molecular changes that can be probed using optical imaging is signifi cantly enhanced. Presently, contrast agents based on metal nanoparticles, organic fl uorescent dyes, and quantum dots coupled to monoclonal antibodies against cancer specifi c biomarkers are being developed (Sokolov et al. 2003;Rahman et al. 2005).

SAGE Libraries and Tag Mapping
The SAGE technique is capable of producing a molecular representation of cervical tissue based on expressed genes. SAGE is not dependent on pre-existing databases of expressed genes and so provides an independent view of gene expression profi les within the mRNA populations (Velculescu et al. 1997). SAGE library construction is well documented in the literature (Velculescu et al. 1995 and1997;Madden et al. 2000;Saha et al. 2002;Pleasance et al. 2003;Sander et al. 2005). Several recent gene expression profi les of in vitro HPVinfected cultured keratinocytes and from cervical carcinoma clinical samples have proposed changes in gene expression induced by HPV and in early cervical carcinomas (Thomas et al. 2001;Ruutu et al. 2002;Duffy et al. 2003;Pérez-Plasencia et al. 2005). Some studies have compared normal versus tumor-induced gene expression in cervical samples with the aim of identifying potential tumor markers of clinical value (Shim et al. 1998;Chen et al. 2003).
To identify genes expressed at dissimilar levels in preinvasive neoplastic and non-neoplastic untyped cervical tissue, we analysed sixteen long-SAGE libraries; 4 from normal cervical tissue samples, 3 of a mild dysplasia (CINI), 3 of moderate dysplasia (CINII), and 6 of severe dysplasia (CINIII), or carcinoma-in-situ. The CIN tissues are positive for MUC16. Raw numbers of longSAGE tags generated and library names are given in Tables  1 and 2. DiscoverySpace (Robertson et al. 2007), an in-house graphical software application backed by a relational database system designed to support SAGE gene expression analysis, was used to query data from over 25 publicly available data sources, as well as internal experimental results. Using DiscoverySpace, selected SAGE tag sequences were mapped to counterpart RefSeq (Pruitt et al. 2000(Pruitt et al. , 2005 genes and confi rmed using SAGE tag co-ordinates to establish gene identity through Ensembl (Hubbard et al. 2007;homo_sapiens_ core_41_36c). Genes were manually curated (EntrezGene) to ascertain gene identity and gene product localisation. These cervical longSAGE libraries were created from the epithelium of cervical biopsy samples collected just prior to LEEP (Loop Electrosurgical Excision Procedure). Tissue samples were placed into RNAIater and frozen at −80 °C within 10 minutes of being excised from the patient. These longSAGE libraries (Shadeo et al. 2007) have been submitted to the NCBI Gene Expression Omnibus (GEO) repository.
Any protein differentially expressed in cancer tissue, compared to normal tissue, or any protein known to be involved in cancer development, has potential as a candidate cancer biomarker. Genes presenting properties which identify them as likely targets for cancer diagnosis or prognosis must be separated from thousands of other genes which also may also possess clinical potential. Hundreds of potential candidates must be set aside in favour of gene products which offer the most promising characteristics. We focus on genes encoding membrane associated proteins because membrane-bound proteins are most likely to be accessible to topical application of contrast agents and have a rapid time frame for contrast agent visualization. Genes expressing the greatest number of tags combined with high levels of differential expression between dysplastic and normal tissue are the most likely to be observed by contrast agents in vivo.
For optical imaging of tumors by topical application in vivo of contrast agents to be of practical use, a large number of contrast agent receptors are required. One of the standard methods to detect candidate biomarkers is to identify genes with amplified expression in cancer and/or normal tissues. We compared transcription profi les and retained the most highly expressed membrane-bound gene products whose differential expression level is greater than two-fold. Many cell-surface proteins can potentially be developed as targets for optical contrast agents. Using longSAGE, we have also identifi ed the most highly differentially expressed transcripts between disease and normal tissue. Table 1 specifi es those genes (with cell-surface gene products) up-regulated in the CINI and CINIII stages of dysplasia and Table 2 lists genes upregulated in normal tissue. Short descriptions of these protein biomarkers are given in the appendix and annotations of protein structural information, if available, are included. Given that CINII is diffi cult to determine clinically, it was not included in these comparisons. Contrast agent visualization of the epidermal growth factor receptor (EGFR) using an anti-EGFR monoclonal antibody has already been successful (Rahman et al. 2005). More of these markers should prove amenable to contrast agent development and topical formulations consisting of a range of contrast agents could help adjust to individual patient differences in gene expression.

Cervical Intraepithelial Neoplasia Stage Biomarkers
It is possible to evaluate a marker for presence or absence, but to correlate a marker or array of markers to changes in cellular localization relative to other markers is probably the most interesting and benefi cial in terms of dysplasia progression, environment, therapy selection, and follow-up. The known function of these genes grants some insight into the biology of cervical neoplasia. For instance, several of these cell surface markers are involved in transport and/or signaling. MUCX and CD74, upregulated in CINI and CINIII, have signaling gene products known to be associated with carcinomas. CD74 is also known to be a high affi nity binding protein for macrophage migration-inhibitory factor (MIF) which is implicated in tumor cell growth and angiogenesis. TSPAN1, upregulated in CINIII almost 10-fold, also plays a role in cell motility and growth. See Appendix for genespecifi c references.
Our analysis of cervical cancer longSAGE expression profi les direct attention to some genes with relatively equal distribution in CINI and CINIII, such as PIGR. Another marker, ANPEP, is present at signifi cantly different expression levels in CINI and CINIII. This knowledge expands the possibilities for rapid visualization between normal and stages of dysplasia in vivo. As discussed earlier, these cell surface targets were found by identifying differentially expressed genes. More often than not, a highly expressed tag is not localised to the cell surface and, for these purposes, does not warrant further attention. However, a highly differentially expressed gene whose gene product is not membrane-bound is sometimes found to be part of a mechanism which affects the cell surface and thereby the gene product becomes of potential use. TFF3 (trefoil factor 3), for instance, is up-regulated 13-fold in CINIII and 27-fold in CINI. Members of the trefoil family are characterised by having at least one copy of the trefoil motif, a 40-amino acid domain that contains three conserved disulphides. They are stable, secretory proteins whose functions are not defi ned but may protect the mucosa from insults, stabilize the mucus layer and affect healing of the epithelium. VANGL1 (Van Gogh-like protein 1) is an integral membrane protein which is serine/threonine phosphorylated and translocated to cytoplasmic vesicles in response to TFF3 stimulation (Kalabis et al. 2006). VANGL1 protein acts as a downstream effector of TFF3 signalling and regulates wound healing of intestinal epithelium. TFF3 is commonly expressed in hepatocellular carcinoma and its expression correlates with tumor grade . TFF3 overexpression may be a critical process in mouse and human hepatocellular carcinogenesis (Okada et al. 2005). The group of trefoil factor peptides (TFF1-3) are part of the protective mechanism operating in the intestinal mucosa and play a fundamental role in epithelial protection, repair, and restitution (Vieten et al. 2005). TFF3 and the essential tumor angiogenesis regulator VEGF exert potent proinvasive activity through STAT3 signalling in human colorectal cancer cells (Rivat et al. 2005). That VANGL1 returned to cell membranes within 45 minutes of TFF3 stimulation (Kalabis et al. 2006) could explain the low VANGL1 tag counts, 1-3 tags per library, observed in the longSAGE libraries.

Conclusions
Molecular specifi c contrast agents may provide the ability to directly image the cancer process; but biomarker discovery can be a lengthy process as candidate markers suitable for the task-at-hand must be identified from among thousands of  proteins. SAGE-identified biomarkers hold promise for recognition of the stages of neoplasia by proteomic patterns. Optical contrast agents bound to these membrane-bound protein biomarkers will serve as a complement to histopathology, thus allowing more effective determination of tumor borders and non-invasive observation of response to treatment at a molecular level. We present fi fty-six cell-surface gene products differentially expressed during progression of cervical intraepithelial neoplasia. Differential gene expression of these biomarkers will allow individualized selection of therapeutic combinations that best target the entire disease-protein system and advance understanding of carcinogenesis. and interleukin 1 receptor, type I (Dower et al. 1986;McMahan et al. 1991). It is an important mediator involved in many cytokine induced immune and infl ammatory responses (Boch et al. 2003). Structure information: 1IRA Complex of the interleukin-1 receptor with the interleukin-1 receptor antagonist (IL1RA) (Schreuder et al. 1997). ITR G protein-coupled receptor 180. This protein is produced predominantly in vascular smooth muscle cells and may play an important role in the regulation of vascular remodelling (Iida et al. 2003).
LOC644410 FCGR1C Fc fragment of IgG, high affi nity Ic, receptor (CD64). Only Fc gamma RI has high affi nity for ligand and has a unique third extracellular domain (EC3). Three genes for human Fc gamma RI (A, B, and C) have been characterised; although they are remarkably similar, genes B and C are notably different from A (Ernst et al. 1992).
Structure information: 1E4J Crystal structure of the soluble human FC-gamma receptor III (Sondermann et al. 2000).
LY6E lymphocyte antigen 6 complex, locus E. Acute promyelocytic leukemia APL is a human malignancy that responds to differentiation therapy with all-trans-retinoic acid ATRA (Huang et al. 1988). ATRA induces the expression of a novel human gene, RIG-E (Mao et al. 1996). The amino acid composition of its product indicates that it is membrane-associated and has high homology to the murine LY-6 proteins and weak homology with a number of human growth factor receptors.
LYNX1 Ly6/neurotoxin 1. Ly-6/neurotoxin gene family members are lymphocyte antigens that attach to the cell surface by a glycosylphosphatidylinositol anchor and have a unique structure displayed 8-10 conserved cysteine residues (Tsuji et al. 2003). Functional analysis indicates that LYNX1 can enhance nicotinic acetylcholine receptor function in the presence of acetylcholine (Arredondo et al. 2006). It is a new marker for human breast cancer (Lee et al. 2006).
LYPD3 LY6/PLAUR domain containing 3 (C4.4A). This protein is known to be a structural homologue of the urokinase-type plasminogen activator receptor (uPAR) but little is known about its function (Hansen et al. 2004).
MAL mal, T-cell differentiation protein. The protein encoded by this gene is a highly hydrophobic integral membrane protein belonging to the MAL family of proteolipids (Llorente et al. 2004;Dukhovny et al. 2006).
MUC1 mucin 1, cell-surface associated. This gene is a member of the mucin family and encodes a membrane bound, glycosylated phosphoprotein. The protein is anchored to the apical surface of many epithelia by a trans-membrane domain, with the degree of glycosylation varying with cell type. The protein serves a protective function by binding to pathogens and also functions in a cell signalling capacity (Ren et al. 2006). Over-expression, aberrant intracellular localization, and changes in glycosylation of this protein have been associated with carcinomas (Rabassa et al. 2006;Raina et al. 2006).
Structure information: 2ACM Solution structure of the SEA domain of human mucin 1 (MUC1) (Macao et al. 2006).
MUC16 mucin 16 (CA125), cell surface associated. CA125 protein core is composed of a short cytoplasmic tail, a trans-membrane domain, and an extraordinarily large glycosylated extracellular structure. The extracellular domain encompasses an interactive disulfi de bridged cysteine-loop and the site of OC125 and M11 binding (O'Brien et al. 2001). It is known to be a marker in several cancers, including ovarian (Yin et al. 2002), renal (Bamias et al. 2003), and lung (Pollan et al. 2003).
PERP PERP, TP53 apoptosis effector. This tetraspan protein localizes to the plasma membrane, rather than to mitochondria, and may stimulate apoptosis (Ihrie and Attardi, 2004).
PIGR polymeric immunoglobulin receptor. PIGR mediates trans-cellular transport of polymeric immunoglobulin molecules. The receptor has 5 units with homology to the variable (V) units of immunoglobulins and a trans-membrane region, which also has some homology to certain immunoglobulin variable regions.
Structure information: 1XED Crystal Structure of a Ligand-Binding Domain of the Human Polymeric Ig Receptor, pIgR (Hamburger et al. 2004).